InfoMagic Internet Tools 1993 July

home *** CD-ROM | disk | FTP | other *** search

/ InfoMagic Internet Tools 1993 July / Internet Tools.iso / RockRidge / ip / nfs / amd / amd-5.2 / doc / overview.tex < prev next >

Wrap

Text File | 1990-06-23 | 13.9 KB | 279 lines

% $Id: overview.tex,v 5.2 90/06/23 22:21:50 jsp Rel $ % % Copyright (c) 1989 Jan-Simon Pendry % Copyright (c) 1989 Imperial College of Science, Technology & Medicine % Copyright (c) 1989 The Regents of the University of California. % All rights reserved. % % This code is derived from software contributed to Berkeley by % Jan-Simon Pendry at Imperial College, London. % % Redistribution and use in source and binary forms are permitted provided % that: (1) source distributions retain this entire copyright notice and % comment, and (2) distributions including binaries display the following % acknowledgement: ``This product includes software developed by the % University of California, Berkeley and its contributors'' in the % documentation or other materials provided with the distribution and in % all advertising materials mentioning features or use of this software. % Neither the name of the University nor the names of its contributors may % be used to endorse or promote products derived from this software without % specific prior written permission. % THIS SOFTWARE IS PROVIDED ``AS IS'' AND WITHOUT ANY EXPRESS OR IMPLIED % WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF % MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. % % %W% (Berkeley) %G% \Chapter{Overview} \pagenumbering{arabic} \Amd\ maintains a cache of mounted filesystems. Filesystems are {\em demand-mounted} when they are first referenced, and unmounted after a period of inactivity. \Amd\ may be used as a replacement for Sun's {\bf automount}(8) \cite{usenix:automounter,sun:automount} program. It contains no proprietary source code and has been ported to numerous flavours of \Unix\ (see table \ref{table:os},~p\pageref{table:os}). \Amd\ was designed as the basis for experimenting with filesystem layout and management. Although \amd\ has many direct applications it is loaded with additional features which have little practical use. At some point the infrequently used components may be removed to streamline the production system. %\Amd\ supports the notion of {\em replicated} filesystems by evaluating %each member of a list of possible filesystem locations in parallel. %\Amd\ checks that each cached mapping remains valid. Should a mapping be %lost -- such as happens when a fileserver goes down -- \amd\ automatically %selects a replacement should one be available. The fundamental concept behind \amd\ is the ability to separate the name used to refer to a file from the name used to refer to its physical storage location. This allows the same files to be accessed with the same name regardless of where in the network the name is used. This is very different from placing {\tt /n/hostname} in front of the pathname since that includes location dependent information which may change if files are moved to another machine. By placing the required mappings in a centrally administered database, filesystems can be re-organised without requiring changes to password files, shell scripts and so on. \Section{Filesystems and Volumes} \Amd\ views the world as a set of fileservers, each containg one or more filesystems where each filesystem contains one or more {\em volumes}. Here the term volume is used to refer to a coherent set of files such as a user's home directory or a \TeX\ distribution. In order to access the contents of a volume, \amd\ must be told in which filesystem the volume resides and which host owns the filesystem. By default the host is assumed to be local and the volume is assumed to be the entire filesystem. If a filesystem contains more than one volume, then a {\em sublink} is used to refer to the sub-directory within the filesystem where the volume can be found. \Section{Volume Naming} Volume names are assumed to be unique across the entire network. A volume name is the pathname to the volume's root as known by the users of that volume. Since this name uniquely identifies the volume contents, all volumes can be named and accessed from each host, subject to administrative controls. Volumes may be replicated or duplicated. Replicated volumes contain identical copies of the same data and reside at two or more locations in the network. Each of the replicated volumes can be used interchangeably. Duplicated volumes each have the same name but contain different, though functionally identical, data. For example, {\tt /vol/tex} might be the name of a \TeX\ distribution which varied for each machine architecture. \Amd\ provides facilities to take advantage of both replicated and duplicated volumes. Configuration options allow a single set of configuration data to be shared across an entire network by taking advantage of replicated and duplicated volumes. \Amd\ can take advantage of replacement volumes by mounting them as required should an active fileserver become unavailable. \Section{Volume Binding} \Unix\ implements a namespace of hierarchically mounted filesystems. Two forms of binding between names and files are provided. A {\em hard link} completes the binding when the name is added to the filesystem. A {\em soft link} delays the binding until the name is accessed. An {\em automounter} adds a further form in which the binding of name to filesystem is delayed until the name is accessed. The target volume, in its general form, is a tuple (host, filesystem, sublink) which can be used to name the physical location of any volume in the network. When a target is referenced, \amd\ ignores the sublink element and determines whether the required filesystem is already mounted. This is done by computing the local mount point for the filesystem and checking for an existing filesystem mounted at the same place. If such a filesystem already exists then it is assumed to be functionally identical to the target filesystem. By default there is a one-to-one mapping between the pair (host, filesystem) and the local mount point so this assumption is valid. \Section{Operational Principles} \Amd\ operates by introducing new mount points into the namespace. The kernel sees these mount points as \NFS\ \cite{sun:nfs} filesystems being served by \amd. Having attached itself to the namespace, \amd\ is now able to control the view the rest of the system has of those mount points. RPC \cite{sun:rpc} calls are received from the kernel one at a time. When a {\em lookup} call is received \amd\ checks whether the name is already known. If it is not, the required volume is mounted. A symbolic link pointing to the volume root is then returned. Once the symbolic link is returned, the kernel will send all other requests direct to the mounted filesystem. If a volume is not yet mounted, \amd\ consults a configuration {\em mount-map} corresponding to the automount point. \Amd\ then makes a runtime decision on what and where to mount a filesystem based on the information obtained from the map. \Amd\ does not implement all the \NFS\ requests; only those relevant to name binding such as {\em lookup}, {\em readlink} and {\em readdir}. Some other calls are also implemented but most simply return an error code; for example {\em mkdir} always returns ``Read-only filesystem''. \Section{Mounting a Volume} Each automount point has a mount map. The mount map contains a list of key--value pairs. The key is the name of the volume to be mounted. The value is a list of locations describing where the filesystem is stored in the network. In the source for the map the value would look like \begin{quote} ${\em location}_1\ \ {\em location}_2\ \ \ldots\ \ {\em location}_n$ \end{quote} \Amd\ examines each location in turn. Each location may contain {\em selectors} which control whether \amd\ can use that location. For example, the location may be restricted to use by certain hosts. Those locations which cannot be used are ignored. \Amd\ attempts to mount the filesystem described by each remaining location until a mount succeeds or \amd\ can no longer proceed. The latter can occur in three ways: \begin{itemize} \item If none of the locations could be used, or if all of the locations caused an error, then the last error is returned. \item If a location could be used but was being mounted in the background then \amd\ marks that mount as being ``in progress'' and continues with the next request; no reply is sent to the kernel. \item Lastly, one or more of the mounts may have been {\em deferred}. A mount is deferred if extra information is required before the mount can proceed. When the information becomes available the mount will take place, but in the mean time no reply is sent to the kernel. If the mount is deferred, \amd\ continues to try any remaining locations. \end{itemize} %\Section{Task Scheduling}\label{task scheduler} % %\Amd\ provides a task scheduler to support its non-blocking semantics. %The basic operation of the scheduler is to call a procedure when %a particular event occurs. A general sleep/wakeup mechanism is used %and sub-process support is built on that. The scheduler maintains %two queues: one of blocked calls and one of callbacks waiting to %be made. %When a child process exits, its exit status is picked up by a signal %handler and a wakeup is issued on the internal job descriptor for that sub-process. %A timeout/untimeout mechanism provides for time dependent processing. \Section{Automatic Unmounting} To avoid an ever increasing number of filesystem mounts, \amd\ removes volume mappings which have not been used recently. A time-to-live interval is associated with each mapping and when that expires the mapping is removed. When the last reference to a filesystem is removed, that filesystem is unmounted. If the unmount fails, for example the filesystem is still busy, the mapping is re-instated and its time-to-live interval is extended. The global default for this grace period is controlled by the ``-w'' command-line option (\see \Ref{opt:wait}). It is also possible to set this value on a per-mount basis (\see \Ref{opt:utimeout}). \Section{Keep-alives}\label{keepalives} Use of some filesystem types requires the presence of a server on another machine. If a machine crashes then it is of no concern to processes on that machine that the filesystem is unavailable. However, to processes on a remote host using that machine as a fileserver this event is important. This situation is most widely recognised when an \NFS\ server crashes and the behaviour observed on client machines is that more and more processes hang. In order to provide the possibility of recovery, \amd\ implements a {\em keep-alive} interval timer for some filesystem types. Currently only \NFS\ makes use of this service. The basis of the \NFS\ keep-alive implementation is the observation that most sites maintain replicated copies of common system data such as manual pages, most or all programs, system source code and so on. If one of those servers goes down it would be reasonable to mount one of the others as a replacement. The first part of the process is to keep track of which fileservers are up and which are down. \Amd\ does this by sending RPC requests to the servers' \NFS\ {\sc NullProc} and checking whether a reply is returned. While the server state is uncertain the requests are re-transmitted at three second intervals and if no reply is received after four attempts the server is marked down. If a reply is received the fileserver is marked up and stays in that state for 30 seconds at which time another \NFS\ ping is sent. Once a fileserver is marked down, requests continue to be sent every 30 seconds in order to determine when the fileserver comes back up. During this time any reference through \amd\ to the filesystems on that server fail with the error ``Operation would block''. If a replacement volume is available then it will be mounted, otherwise the error is returned to the user. %\Amd\ keeps track of which servers are up and which are down. %It does this by sending RPC requests to the servers' \NFS\ {\sc NullProc} and %checking whether a reply is returned. If no replies are received after a %short period, \amd\ marks the fileserver {\em down}. %RPC requests continue to be sent so that it will notice when a fileserver %comes back up. %ICMP echo packets \cite{rfc:icmp} are not used because it is the availability %of the \NFS\ service that is important, not the existence of a base kernel. %Whenever a reference to a fileserver which is down is made via \amd\, an alternate %filesystem is mounted if one is available. Although this action does not protect user files, which are unique on the network, or processes which do not access files via \amd\ or already have open files on the hung filesystem, it can prevent most new processes from hanging. %With a suitable combination of filesystem management and mount-maps, %machines can be protected against most server downtime. This can be %enhanced by allocating boot-servers dynamically which allows a diskless %workstation to be quickly restarted if necessary. Once the root filesystem %is mounted, \amd\ can be started and allowed to mount the remainder of %the filesystem from whichever fileservers are available. \Section{Non-blocking Operation} Since there is only one instance of \amd\ for each automount point, and usually only one instance on each machine, it is important that it is always available to service kernel calls. \Amd\ goes to great lengths to ensure that it does not block in a system call. As a last resort \amd\ will fork before it attempts a system call that may block indefinitely, such as mounting an \NFS\ filesystem. Other tasks such as obtaining filehandle information for an \NFS\ filesystem, are done using a purpose built non-blocking RPC library which is integrated with \amd's task scheduler.% (\see \Ref{task scheduler}). This library is also used to implement \NFS\ keep-alives (\see \Ref{keepalives}). Whenever a mount is deferred or backgrounded, \amd\ must wait for it to complete before replying to the kernel. However, this would cause \amd\ to block waiting for a reply to be constructed. Rather than do this, \amd\ simply {\em drops} the call under the assumption that the kernel RPC mechanism will automatically retry the request.